Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler / stack trace fixes #645

Merged
merged 32 commits into from
Nov 30, 2023
Merged

Conversation

ggcrunchy
Copy link
Contributor

This should fix stack traces while also keeping the new profiling features intact.


As with Lua code, where you do a couple of system.getTimer() or os.clock() calls and take the difference, the profiling logic must start and end somewhere. These are more complex than simple numbers, so an error in the middle could leave things in a broken state.

This would blow up for a couple reasons. Listeners were getting profiled, and this would include "unhandledError", which would interact with the aforementioned brokenness. Alternatively, you might have an error in a pcall()'d routine and run into a similar situation down the line.

I was trying to address this by pcall()ing the listeners' functions and, if necessary, cleaning up the state and relaying the error.

By handling errors that way, however, I was throwing away info that would propagate to the ultimate handler and give us a trace.


Also, pcall() is, by its nature, not as fast as a raw call, so some slowdown was introduced, albeit probably minor for most needs.


In the absence of errors, everything is handled by the _beginProfile() and _endProfile() calls. The problem is that, without something like our earlier pcall()s, errors will miss the _endProfile() calls on the way back.

There could be a failsafe in the unhandled error logic, but it would miss the cases where users themselves call pcall(), or errors in coroutines that are left on their stack rather than being thrown.

You could be inside a listener that was called by another listener (and so on), e.g. via a dispatchEvent() inside the "enterFrame", in the simplest case. A user might do a pcall() inside one of these listeners but have an error deeper in the chain: this should only unravel the state up to the point of said pcall().

Together with coroutines, these suggest we actually need to operate at the level of Lua's call stack.

What I ended up doing was adding a feature to "bookmark" the stack frames where some kind of deterministic state opens, in this case when a listener began, and that is closed when it ends. These are per-thread, so any coroutines will each have their own distinct sets.

Rather than penalize the average Lua call, these are added and removed—in the absence of an error—manually in _beginProfile() and _endProfile().

If a pcall() traps an error, these bookmarked frames will be peeled off and a handler will close any profiling states attached to them. Any frames below the pcall() site are left intact.

Coroutines are a little different due to how they leave the error information on their stack. In the case of an error or yield, all the bookmarked frames will be put to sleep (profiling states closed, but bookmarks left intact). A resume, on the other hand, will wake them back up (opening states).

The bookmark info consists of an index (where the frame is in its call stack) and a listener ID. These are each 16-bit integers that are bitwise or'd together. (This is only for compactness and are implementation details, so could be changed if it ever becomes limiting.)


Generating these IDs meant a redesign of the profiling logic, from a previous linked list of profiles to an array.

(This would probably have been a good idea anyhow, as with any sufficient number of profiles the search through the list would have gotten quite slow and wasn't cache-friendly besides.)

The C++-side listeners now each cache an ID on first use. On the simulator this means incrementing a comparison value at the beginning of each run to invalidate those IDs. On the Lua side, the ID is created on demand along with a table or function listener array.


Those profile arrays want Rtt_Allocators, so this called for some rearrangement for initialize and destroy logic. On Windows this worked fairly well, but I got a nasty surprise when testing on Mac. 😄 Basically, during relaunches there's a Finalizer() that gets called first followed by the new Initialize()... on Windows. On Mac it seems to do the latter then finalize, so using that to clean up some state in static variables was a problem. I ended up putting the offending bits into a new ProfilingState singleton that belongs to the Display and participates in its lifetime.

I moved a lot of stuff around and into that state. I left the sums alone (since they're fine as a linked list), as well as a couple methods, although they might be a strange fit now. Anyhow, fewer globals, so should be cleaner all told.


Attached are my tests:

profile_fix_tests.zip

I've commented out the actual profiling printouts and table listeners, just because it got really noisy, but they can be added back.

There's file output if you run it through the shell, which I did just to make sure it worked without the simulator-side invalidation of IDs.

Anyhow, if Rtt_Debug is defined a few helper routines seen in these tests are enabled, but not otherwise.

I tried to test a range of normal behavior, in both regular calls and coroutines, throwing errors or not, and can explain if there's any interest. The output can be a little weird since the call stack + profiling state won't actually be removed until you exit the current function, but knowing that it's possible to compare for correctness.

Unhandled errors (with a listener or otherwise) are actually called by Solar within a larger enclosing pcall(), so you will still see the aforementioned details there as well.


In theory the bookmark stuff could be used as a poor man's version of Lua 5.4's to-be-closed variables, pegged to listeners (the profiling states are basically one such resource). At the moment they're still pegged to a couple of profiling quirks, though.


Open issue( ? ): I don't know if debug hooks need any handling with respect to bookmarks.

ggcrunchy and others added 30 commits February 7, 2019 17:09
…nd buffer and assignment logic

Next up, prepare details from graphics.defineEffect()
…from Program

Slight redesign of TimeTransform with caching (to synchronize multiple invocations with a frame) and slightly less heavyweight call site
Err, was calling Modulo for sine method, heh

That said, time transform seems to work in basic test
Fixed error detection for positive number time transform inputs

Relaxed amplitude positivity requirement
Maintenance Revert Test
My mistake adding back changes made
Added bookmarking to Lua

Done automatically for coroutine yield and resume, or error on pcall

Can call manually otherwise

Will invoke a user-defined bookmark function in those scenarios; the manual part does that in normal code, only managing marks for the automatic cases

Redesigned profilings as array rather than list (ID / mark = array index)

IDs are precomputed / one-time-on-demand-computed rather than a lazy search
Off-by-one indexing error in bookmark function itself fixed

Helper functions for debugging
Overhauled profiling with new ProfilingState singleton, stored as part of display, rather than static variables

Used 0xFFFF rather than max calls constant in bookmark logic (since Mac complained)
@XeduR
Copy link
Contributor

XeduR commented Nov 30, 2023

@Shchvova You approved these changes, but they weren't yet merged. Was that an oversight or are you holding these until a later build?

@Shchvova
Copy link
Contributor

I wanted to get android fixes before merging it in to make distribute risks between releases. I think it’s ok to merge it bow

@Shchvova Shchvova merged commit de2ea49 into coronalabs:master Nov 30, 2023
1 check passed
@solares
Copy link

solares commented Dec 1, 2023

Happy to see this merged in! Did it make it into 3700 or will it be subsequent release?

@XeduR
Copy link
Contributor

XeduR commented Dec 1, 2023

It's not in 3700 yet. It'll be in the next one.

@solares
Copy link

solares commented Dec 1, 2023

Thanks @XeduR

@naveen-pcs
Copy link

I released a production build using 3701 that includes these changes and it seems like it's causing a new crash for my players.

[split_config.arm64_v8a.apk!libcorona.so] Rtt_Profiling.cpp - Rtt::Profiling::EntryRAII::EntryRAII(Rtt::ProfilingState&, int&, char const*)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants